19 research outputs found

    The need for open source software in machine learning

    No full text
    Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not used, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community

    Assessing the gene regulatory landscape in 1,188 human tumors

    Get PDF
    Cancer is characterised by somatic genetic variation, but the effect of the majority of non-coding somatic variants and the interface with the germline genome are still unknown. We analysed the whole genome and RNA-seq data from 1,188 human cancer patients as provided by the Pan-cancer Analysis of Whole Genomes (PCAWG) project to map cis expression quantitative trait loci of somatic and germline variation and to uncover the causes of allele-specific expression patterns in human cancers. The availability of the first large-scale dataset with both whole genome and gene expression data enabled us to uncover the effects of the non-coding variation on cancer. In addition to confirming known regulatory effects, we identified novel associations between somatic variation and expression dysregulation, in particular in distal regulatory elements. Finally, we uncovered links between somatic mutational signatures and gene expression changes, including TERT and LMO2, and we explained the inherited risk factors in APOBEC-related mutational processes. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects

    Integrated multi-omics reveals anaplerotic rewiring in methylmalonyl-CoA mutase deficiency

    Full text link
    Multi-layered omics approaches can help define relationships between genetic factors, biochemical processes and phenotypes thus extending research of inherited diseases beyond identifying their monogenic cause 1. We implemented a multi-layered omics approach for the inherited metabolic disorder methylmalonic aciduria (MMA). We performed whole genome sequencing, transcriptomic sequencing, and mass spectrometry-based proteotyping from matched primary fibroblast samples of 230 individuals (210 affected, 20 controls) and related the molecular data to 105 phenotypic features. Integrative analysis identified a molecular diagnosis for 84% (177/210) of affected individuals, the majority (148) of whom had pathogenic variants in methylmalonyl-CoA mutase (MMUT). Untargeted analysis of all three omics layers revealed dysregulation of the TCA cycle and surrounding metabolic pathways, a finding that was further corroborated by multi-organ metabolomics of a hemizygous Mmut mouse model. Integration of phenotypic disease severity indicated downregulation of oxoglutarate dehydrogenase and upregulation of glutamate dehydrogenase, two proteins involved in glutamine anaplerosis of the TCA cycle. The relevance of disturbances in this pathway was supported by metabolomics and isotope tracing studies which showed decreased glutamine-derived anaplerosis in MMA. We further identified MMUT to physically interact with both, oxoglutarate dehydrogenase complex components and glutamate dehydrogenase providing evidence for a multi-protein metabolon that orchestrates TCA cycle anaplerosis. This study emphasizes the utility of a multi-modal omics approach to investigate metabolic diseases and highlights glutamine anaplerosis as a potential therapeutic intervention point in MMA. Take home message Combination of integrative multi-omics technologies with clinical and biochemical features leads to an increased diagnostic rate compared to genome sequencing alone and identifies anaplerotic rewiring as a targetable feature of the rare inborn error of metabolism methylmalonic aciduria

    Robustes boosting durch konvexe optimierung

    No full text

    An Introduction to Boosting and Leveraging

    No full text
    We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners ahd their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, "tricks of the trade", and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed

    On the convergence of leveraging

    No full text

    Constructing boosting algorithms from SVMs: an application to one-class classification

    No full text
    We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm-one-class leveraging-starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach
    corecore